This project aims to replicate the findings of the paper by (Lima and Delen 2020) published in Government Information Quarterly (ABDC: A, ABS: 3, Q1). However, it is important to note that the original dataset used by the authors was not made publicly available. As a result, I had to collect and reorganize the data from their respective sources. Unfortunately, some variables referenced in the paper are no longer accessible, which prevented their inclusion in this replication. Consequently, the results of this project differ significantly from those reported in the original study. \
As highlighted in the literature (Moody, Keister, and Ramos 2022), replicating social science research is often challenging due to factors like data unavailability. Despite these challenges, this project showcases critical data modeling techniques, including acquiring data from multiple sources, merging, manipulating, and transforming data, and applying machine learning methods.
The theoretical contributions of this paper are twofold. First, it employs multiple prediction models alongside a heuristic method to assess variable importance, based on the ratio of candidate splits to splits in the Random Forest’s statistical output. This approach provides a nuanced understanding of variable significance. Second, the paper makes a notable contribution by using machine learning techniques to identify potential predictors of corruption at the country level, rather than the more commonly analyzed regional level.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(haven)
library(readxl)
library(readr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(wbstats)
df1<-read_csv(file.choose()) ##Ease of doing Business Data to Extract the variables' codes
## New names:
## Rows: 41322 Columns: 22
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): Country Name, Country Code, Indicator Name, Indicator Code dbl (17): 2003,
## 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, ... lgl (1): ...22
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...22`
df1<-df1 %>% clean_names() ## Refining variables' names
indicators_codes<-c(unique(df1$indicator_code)) ## Extract variables' codes
DB_data<-wb_data(indicators_codes) ## Retrieving variables" values from World bank's API
efi.df<-read_excel(file.choose()) ## Loading Economic Freedom Index (Heritage Foundation)
cpi.df<-read_excel(file.choose()) ## Loading Corruption Perception Index (Transparency International)
educ.df<-read_excel(file.choose())
The datasets are gathered from different sources and they do not follow a uniform organizational structure; I have to make them all follow the same panel structure before merging them.
#Data Preparation
## I begin with the dataset that needs serious reshaping : cpi.df
cpi.df<-cpi.df %>% clean_names() #We refine names first to ease the next procedures
cpi_long <- cpi.df %>%
pivot_longer(
cols = starts_with("x"), # Select all columns starting with "x"
names_to = "year", # New column name for years
values_to = "value" # New column name for values
) %>%
mutate(
year = as.integer(gsub("x", "", year)) # Remove "x" and convert to integer
) %>%
select(economy_iso3, economy_name, indicator_id, indicator, attribute_1, attribute_2, attribute_3, partner, year, value)
cpi_long<- cpi_long %>% select(economy_iso3,economy_name,year,value)
### renaming columns for standardization pruposes
# Rename columns in cpi_long
cpi_long <- cpi_long %>%
rename(
iso3 = economy_iso3,
country = economy_name,
cpi = value
)
## efi.df is organized in a panel strcuture but it lacks an important variable, namely, country codes, this variable is needed to merge the datasets. It must be incorporated.
## The variables' names must refined first
efi.df<-efi.df %>% clean_names()
library(countrycode)
efi.df$iso3<-NULL
efi.df <- efi.df %>%
mutate(iso3 = countrycode(country, "country.name", "iso3c"))
efi.df <- efi.df %>%
mutate(iso3 = ifelse(country == "Kosovo", "XKX", iso3), # Assign XKX to Kosovo
iso3 = ifelse(country == "Micronesia", "FSM", iso3)) # Assign FSM to Micronesia
## Education index dataset must be converted into long format
educ_long <- educ.df %>%
pivot_longer(
cols = -country, # Specify that all columns except 'country' should be gathered
names_to = "year", # The name of the new column that will hold the year values
values_to = "educ_index" # The name of the new column that will hold the corresponding values
)
### country codes (iso3c) is missing :
educ_long <- educ_long %>%
mutate(iso3 = countrycode(country, "country.name", "iso3c"))
educ_long <- educ_long %>%
mutate(iso3 = ifelse(country == "Kosovo", "XKX", iso3), # Assign XKX to Kosovo
iso3 = ifelse(country == "Micronesia", "FSM", iso3),
iso3 = ifelse(country == "Chili", "CHL", iso3),
iso3 = ifelse(country == "Monte Negro", "MNE", iso3)
) # Assign FSM to Micronesia
## The 3 Datasets are pretty muched ready for merging now, but not all variables are needed from DB_data, only the needed ones must be selected
variables<-c("IC.REG.STRT.BUS.DFRN","IC.REG.COST.PC.MA.ZS","IC.REG.PROC.MA.NO",
"IC.REG.DURS.MA.DY","IC.REG.DURS.FE.DY","IC.REG.PROC.FE.NO",
"IC.REG.COST.PC.FE.ZS","IC.REG.MIN.CAP","IC.CNST.PRMT.DFRN.DB1619",
"IC.CNST.PRMT.DFRN.DB0615","IC.CNST.PRMT.PROC.NO","IC.CNST.PRMT.TM.DY",
"IC.CNST.PRMT.COST.WRH.VAL","IC.DCP.BQC.XD.015.DB1619","IC.CNST.PRMT.QBR.XD.02.DB1619",
"IC.CNST.PRMT.QCBC.XD.01.DB1619","IC.CNST.PRMT.QCDC.XD.03.DB1619",
"IC.CNST.PRMT.QCAC.XD.DB1619","IC.CNST.LIR.XD.02.DB1619","IC.CNST.PC.XD.04.DB1619",
"IC.ELC.ACES.DFRN.DB1015","IC.ELC.ACES.DFRN.DB1619","IC.ELC.PROC.NO","IC.ELC.TIME",
"IC.ELC.ACS.COST","IC.ELC.RSTT.XD.08.DB1619","IC.ELC.OUTG.FREQ.DURS.03.DB1619",
"IC.ELC.MONT.OUTG.01.DB1619","IC.ELC.RSTOR.01.DB1619","IC.ELC.REGU.MONT.01.DB1619",
"IC.ELC.LMTG.OUTG.01.DB1619","IC.ELC.COMM.TRFF.CG.01.DB1619",
"IC.REG.PRRT.DFRN.DB0515","IC.REG.PRRT.DFRN.DB1719","IC.REG.PRRT.COST.PRT.VAL",
"IC.REG.PRRT.DURS.TM","IC.REG.PRRT.PROC.NO","IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16",
"IC.REG.PRRT.RELI.INFR.XD.09.DB1619","IC.REG.PRRT.TRAP.INFO.XD.06.DB1619",
"IC.REG.PRRT.GEO.COVR.XD.08.DB1619","IC.REG.PRRT.LAND.DISP.XD.08.DB1619",
"IC.REG.PRRT.EQACCS.XD.08.DB1619","IC.CRED.ACC.CRD.DB0514.DFRN",
"IC.CRED.ACC.CRD.DB1519.DFRN","IC.CRED.ACC.LGL.RGHT.XD.012.DB1519",
"IC.CRED.ACC.DPTH.CISI.XD.08.DB1519","IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS",
"IC.CRED.ACC.PRVT.CRD.ZS","IC.CRED.ACC.ACES.DB1519","IC.CRED.ACC.ACES.DB0514",
"PROT.MINOR.INV.DFRN.DB1519","PROT.MINOR.INV.DFRN.DB0614",
"PROT.MINOR.INV.EXT.BUS.DISC.010.XD","PROT.MINOR.INV.IC.PRIN.EXT.DIR.LGL.010.XD",
"PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB1519",
"PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB0614",
"PROT.MINOR.INV.EXT.SHARE.RTS.XD.010.DB1519",
"PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519",
"PROT.MINOR.INV.EXT.CORP.TRANP.XD.0010.DB1519",
"PROT.MINOR.INV.STRENG.MIN.INV.PROT.XD.010.DB0614",
"PAY.TAX.DB1719.DRFN","PAY.TAX.DB0616.DFRN","PAY.TAX.PYMT.FREQ.NO","PAY.TAX.TM",
"PAY.TAX.TOT.TAX.RT.ZS","PAY.TAX.PRFT.CP.ZS","PAY.TAX.LABR.TAX.CONTR.ZS",
"OTHR.TAX.PAID.ZS","PAY.TAX.COIT.AU.HRS.DB1719","PAY.TAX.COIT.AU.WKS.DB1719",
"PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN","TRD.ACRS.BRDR.DB1619.DFRN",
"TRD.ACRS.BRDR.DB0615.DFRN","TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN",
"TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN",
"TRD.ACRS.BRDR.EXPT.TM.BRDR.COMP.HR.DB1619.DFRN",
"TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN",
"TRD.ACRS.BRDR.EXPT.COST.DOC.COMP.CD.DB1619",
"TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619",
"ENF.CONT.COEN.DB0415.DFRN","ENF.CONT.COEN.DB1719.DFRN",
"ENF.CONT.DURS.DY","ENF.CONT.COEN.FLSR.DY","ENF.CONT.COEN.TRJU.DY",
"ENF.CONT.COEN.ENJU.DY","ENF.CONT.COEN.COST.ZS","ENF.CONT.COEN.ATFE.PR",
"ENF.CONT.COEN.CTFE.PR","ENF.CONT.COEN.ENFE.PR","ENF.CONT.COEN.QUJP.XD",
"ENF.CONT.COEN.CTSP.DB1719","ENF.CONT.COEN.CSMG","ENF.CONT.COEN.CTAU",
"ENF.CONT.COEN.ATDR","RESLV.ISV.DB1519.DFRN","RESLV.ISV.RCOV.RT",
"RESLV.ISV.SOIF.06.DB1519","RESLV.ISV.COPR.03.XD.DB1519","RESLV.ISV.MGDA.XD.DB1519",
"RESLV.ISV.ROPC.03.XD.DB1519","RESLV.ISV.CPI.04.XD.DB1519",
"IC.BUS.EASE.DFRN.XQ.DB1719","IC.BUS.EASE.DFRN.DB16","IC.BUS.EASE.DFRN.DB1014"
)
data.db<-DB_data %>% select(iso3c,country,date,all_of(variables)) #selecting the variables needed
data.db <- data.db %>%
mutate(dealing_w_construct = coalesce(IC.CNST.PRMT.DFRN.DB0615, IC.CNST.PRMT.DFRN.DB1619))
data.db <- data.db %>%
mutate(getting_electricity = coalesce(IC.ELC.ACES.DFRN.DB1015,IC.ELC.ACES.DFRN.DB1619))
data.db <- data.db %>%
mutate(registering_property = coalesce(IC.REG.PRRT.DFRN.DB0515,IC.REG.PRRT.DFRN.DB1719))
data.db <- data.db %>%
mutate(getting_credit = coalesce(IC.CRED.ACC.CRD.DB0514.DFRN,IC.CRED.ACC.CRD.DB1519.DFRN))
data.db <- data.db %>%
mutate(protecting_minority = coalesce(PROT.MINOR.INV.DFRN.DB0614,PROT.MINOR.INV.DFRN.DB1519))
data.db <- data.db %>%
mutate(paying_taxes = coalesce(PAY.TAX.DB0616.DFRN,PAY.TAX.DB1719.DRFN))
data.db <- data.db %>%
mutate(trading_borders = coalesce(TRD.ACRS.BRDR.DB0615.DFRN,TRD.ACRS.BRDR.DB1619.DFRN))
data.db <- data.db %>%
mutate(enforcing_contracts = coalesce(ENF.CONT.COEN.DB0415.DFRN,ENF.CONT.COEN.DB1719.DFRN))
data.db <- data.db %>%
mutate(overall_score_db = coalesce(IC.BUS.EASE.DFRN.DB1014,
IC.BUS.EASE.DFRN.DB16,
IC.BUS.EASE.DFRN.XQ.DB1719))
data.db2<-data.db ## We may need it later
## Some variables need to be eliinated after getting the new ones
# Vector of variables to be removed
vars_to_remove <- c(
"IC.CNST.PRMT.DFRN.DB0615",
"IC.CNST.PRMT.DFRN.DB1619",
"IC.ELC.ACES.DFRN.DB1015",
"IC.ELC.ACES.DFRN.DB1619",
"IC.REG.PRRT.DFRN.DB0515",
"IC.REG.PRRT.DFRN.DB1719",
"IC.CRED.ACC.CRD.DB0514.DFRN",
"IC.CRED.ACC.CRD.DB1519.DFRN",
"PROT.MINOR.INV.DFRN.DB0614",
"PROT.MINOR.INV.DFRN.DB1519",
"PAY.TAX.DB0616.DFRN",
"PAY.TAX.DB1719.DRFN",
"TRD.ACRS.BRDR.DB0615.DFRN",
"TRD.ACRS.BRDR.DB1619.DFRN",
"ENF.CONT.COEN.DB0415.DFRN",
"ENF.CONT.COEN.DB1719.DFRN",
"IC.BUS.EASE.DFRN.DB1014",
"IC.BUS.EASE.DFRN.DB16",
"IC.BUS.EASE.DFRN.XQ.DB1719"
)
### Remove the variables from the dataset using the vector
data.db2 <- data.db2 %>%
select(-all_of(vars_to_remove))
data.db2<-data.db2 %>% rename(year = date)
data.db2<-data.db2 %>% rename(iso3 = iso3c)
research_data<-merge(data.db2,efi.df, by = c("iso3","year"))
research_data<-merge(research_data,cpi_long,by = c("iso3","year"))
research_data<-merge(research_data,educ_long[,-1],by = c("iso3","year"))
# Convert all columns to numeric except specified columns
research_data[, !(names(research_data) %in% c("country.x", "country.y", "iso3", "country"))] <-
lapply(research_data[, !(names(research_data) %in% c("country.x", "country.y", "iso3", "country"))], as.numeric)
research_data[research_data == "N/A"] <- NA ## replace "N/A" with standard "NA" so it can be recognized by R
research_data[research_data == "NaN"] <- NA
## We impute the missing data; the paper indicated that multivariate normal impitation was used
library(missRanger)
imputed_data<-missRanger(research_data,verbose=1)
##
## Variables to impute: government_integrity, business_freedom, labor_freedom, monetary_freedom, property_rights, government_spending, investment_freedom, cpi, trade_freedom, tax_burden, financial_freedom, overall_score, IC.REG.MIN.CAP, PROT.MINOR.INV.EXT.BUS.DISC.010.XD, PROT.MINOR.INV.IC.PRIN.EXT.DIR.LGL.010.XD, RESLV.ISV.SOIF.06.DB1519, RESLV.ISV.COPR.03.XD.DB1519, RESLV.ISV.MGDA.XD.DB1519, RESLV.ISV.ROPC.03.XD.DB1519, RESLV.ISV.CPI.04.XD.DB1519, protecting_minority, IC.REG.STRT.BUS.DFRN, IC.REG.PROC.MA.NO, IC.REG.DURS.MA.DY, IC.REG.DURS.FE.DY, IC.REG.PROC.FE.NO, IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS, IC.CRED.ACC.PRVT.CRD.ZS, ENF.CONT.DURS.DY, ENF.CONT.COEN.FLSR.DY, ENF.CONT.COEN.TRJU.DY, ENF.CONT.COEN.ENJU.DY, ENF.CONT.COEN.COST.ZS, ENF.CONT.COEN.ATFE.PR, ENF.CONT.COEN.CTFE.PR, ENF.CONT.COEN.ENFE.PR, RESLV.ISV.DB1519.DFRN, RESLV.ISV.RCOV.RT, dealing_w_construct, getting_electricity, registering_property, getting_credit, paying_taxes, trading_borders, enforcing_contracts, overall_score_db, PAY.TAX.PYMT.FREQ.NO, PAY.TAX.TM, PAY.TAX.TOT.TAX.RT.ZS, PAY.TAX.PRFT.CP.ZS, PAY.TAX.LABR.TAX.CONTR.ZS, OTHR.TAX.PAID.ZS, IC.ELC.PROC.NO, IC.ELC.ACS.COST, IC.REG.PRRT.COST.PRT.VAL, IC.REG.PRRT.DURS.TM, IC.REG.PRRT.PROC.NO, IC.CNST.PRMT.PROC.NO, IC.CNST.PRMT.TM.DY, IC.CNST.PRMT.COST.WRH.VAL, IC.REG.COST.PC.MA.ZS, IC.REG.COST.PC.FE.ZS, PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB1519, PROT.MINOR.INV.EXT.SHARE.RTS.XD.010.DB1519, PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519, PROT.MINOR.INV.EXT.CORP.TRANP.XD.0010.DB1519, PROT.MINOR.INV.STRENG.MIN.INV.PROT.XD.010.DB0614, IC.CRED.ACC.LGL.RGHT.XD.012.DB1519, IC.CRED.ACC.DPTH.CISI.XD.08.DB1519, IC.CRED.ACC.ACES.DB1519, IC.ELC.TIME, TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.EXPT.TM.BRDR.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.EXPT.COST.DOC.COMP.CD.DB1619, TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619, IC.ELC.RSTT.XD.08.DB1619, IC.ELC.OUTG.FREQ.DURS.03.DB1619, IC.ELC.MONT.OUTG.01.DB1619, IC.ELC.RSTOR.01.DB1619, IC.ELC.REGU.MONT.01.DB1619, IC.ELC.LMTG.OUTG.01.DB1619, IC.ELC.COMM.TRFF.CG.01.DB1619, IC.DCP.BQC.XD.015.DB1619, IC.CNST.PRMT.QBR.XD.02.DB1619, IC.CNST.PRMT.QCBC.XD.01.DB1619, IC.CNST.PRMT.QCDC.XD.03.DB1619, IC.CNST.PRMT.QCAC.XD.DB1619, IC.CNST.LIR.XD.02.DB1619, IC.CNST.PC.XD.04.DB1619, judicial_effectiveness, fiscal_health, ENF.CONT.COEN.QUJP.XD, ENF.CONT.COEN.CTSP.DB1719, ENF.CONT.COEN.CSMG, ENF.CONT.COEN.CTAU, ENF.CONT.COEN.ATDR, IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16, IC.REG.PRRT.RELI.INFR.XD.09.DB1619, IC.REG.PRRT.TRAP.INFO.XD.06.DB1619, IC.REG.PRRT.GEO.COVR.XD.08.DB1619, IC.REG.PRRT.LAND.DISP.XD.08.DB1619, IC.REG.PRRT.EQACCS.XD.08.DB1619, PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN, PAY.TAX.COIT.AU.HRS.DB1719, PAY.TAX.COIT.AU.WKS.DB1719, PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB0614, IC.CRED.ACC.ACES.DB0514
## Variables used to impute: iso3, year, country.x, IC.REG.STRT.BUS.DFRN, IC.REG.COST.PC.MA.ZS, IC.REG.PROC.MA.NO, IC.REG.DURS.MA.DY, IC.REG.DURS.FE.DY, IC.REG.PROC.FE.NO, IC.REG.COST.PC.FE.ZS, IC.REG.MIN.CAP, IC.CNST.PRMT.PROC.NO, IC.CNST.PRMT.TM.DY, IC.CNST.PRMT.COST.WRH.VAL, IC.DCP.BQC.XD.015.DB1619, IC.CNST.PRMT.QBR.XD.02.DB1619, IC.CNST.PRMT.QCBC.XD.01.DB1619, IC.CNST.PRMT.QCDC.XD.03.DB1619, IC.CNST.PRMT.QCAC.XD.DB1619, IC.CNST.LIR.XD.02.DB1619, IC.CNST.PC.XD.04.DB1619, IC.ELC.PROC.NO, IC.ELC.TIME, IC.ELC.ACS.COST, IC.ELC.RSTT.XD.08.DB1619, IC.ELC.OUTG.FREQ.DURS.03.DB1619, IC.ELC.MONT.OUTG.01.DB1619, IC.ELC.RSTOR.01.DB1619, IC.ELC.REGU.MONT.01.DB1619, IC.ELC.LMTG.OUTG.01.DB1619, IC.ELC.COMM.TRFF.CG.01.DB1619, IC.REG.PRRT.COST.PRT.VAL, IC.REG.PRRT.DURS.TM, IC.REG.PRRT.PROC.NO, IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16, IC.REG.PRRT.RELI.INFR.XD.09.DB1619, IC.REG.PRRT.TRAP.INFO.XD.06.DB1619, IC.REG.PRRT.GEO.COVR.XD.08.DB1619, IC.REG.PRRT.LAND.DISP.XD.08.DB1619, IC.REG.PRRT.EQACCS.XD.08.DB1619, IC.CRED.ACC.LGL.RGHT.XD.012.DB1519, IC.CRED.ACC.DPTH.CISI.XD.08.DB1519, IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS, IC.CRED.ACC.PRVT.CRD.ZS, IC.CRED.ACC.ACES.DB1519, IC.CRED.ACC.ACES.DB0514, PROT.MINOR.INV.EXT.BUS.DISC.010.XD, PROT.MINOR.INV.IC.PRIN.EXT.DIR.LGL.010.XD, PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB1519, PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB0614, PROT.MINOR.INV.EXT.SHARE.RTS.XD.010.DB1519, PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519, PROT.MINOR.INV.EXT.CORP.TRANP.XD.0010.DB1519, PROT.MINOR.INV.STRENG.MIN.INV.PROT.XD.010.DB0614, PAY.TAX.PYMT.FREQ.NO, PAY.TAX.TM, PAY.TAX.TOT.TAX.RT.ZS, PAY.TAX.PRFT.CP.ZS, PAY.TAX.LABR.TAX.CONTR.ZS, OTHR.TAX.PAID.ZS, PAY.TAX.COIT.AU.HRS.DB1719, PAY.TAX.COIT.AU.WKS.DB1719, PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN, TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.EXPT.TM.BRDR.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN, TRD.ACRS.BRDR.EXPT.COST.DOC.COMP.CD.DB1619, TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619, ENF.CONT.DURS.DY, ENF.CONT.COEN.FLSR.DY, ENF.CONT.COEN.TRJU.DY, ENF.CONT.COEN.ENJU.DY, ENF.CONT.COEN.COST.ZS, ENF.CONT.COEN.ATFE.PR, ENF.CONT.COEN.CTFE.PR, ENF.CONT.COEN.ENFE.PR, ENF.CONT.COEN.QUJP.XD, ENF.CONT.COEN.CTSP.DB1719, ENF.CONT.COEN.CSMG, ENF.CONT.COEN.CTAU, ENF.CONT.COEN.ATDR, RESLV.ISV.DB1519.DFRN, RESLV.ISV.RCOV.RT, RESLV.ISV.SOIF.06.DB1519, RESLV.ISV.COPR.03.XD.DB1519, RESLV.ISV.MGDA.XD.DB1519, RESLV.ISV.ROPC.03.XD.DB1519, RESLV.ISV.CPI.04.XD.DB1519, dealing_w_construct, getting_electricity, registering_property, getting_credit, protecting_minority, paying_taxes, trading_borders, enforcing_contracts, overall_score_db, country.y, overall_score, property_rights, government_integrity, judicial_effectiveness, tax_burden, government_spending, fiscal_health, business_freedom, labor_freedom, monetary_freedom, trade_freedom, investment_freedom, financial_freedom, country, cpi, educ_index
##
## iter 1
## | | | 0% | |= | 1% | |= | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================= | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
## iter 2
## | | | 0% | |= | 1% | |= | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================= | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
## iter 3
## | | | 0% | |= | 1% | |= | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 10% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 46% | |================================= | 47% | |================================= | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 94% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
Descriptive Statistics :
# Descriptive Statistics
library(modelsummary)
## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
## backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
##
## Revert to `kableExtra` for one session:
##
## options(modelsummary_factory_default = 'kableExtra')
## options(modelsummary_factory_latex = 'kableExtra')
## options(modelsummary_factory_html = 'kableExtra')
##
## Silence this message forever:
##
## config_modelsummary(startup_message = FALSE)
desc_stats <- datasummary_skim(imputed_data[,-c(1:3,113)])
desc_stats
| Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
|---|---|---|---|---|---|---|---|---|
| IC.REG.STRT.BUS.DFRN | 2100 | 0 | 80.2 | 13.2 | 15.5 | 83.2 | 100.0 | |
| IC.REG.COST.PC.MA.ZS | 1313 | 0 | 25.9 | 38.3 | 0.0 | 12.9 | 393.0 | |
| IC.REG.PROC.MA.NO | 635 | 0 | 7.6 | 2.9 | 1.0 | 7.2 | 20.0 | |
| IC.REG.DURS.MA.DY | 771 | 0 | 23.4 | 28.9 | 0.5 | 15.0 | 690.0 | |
| IC.REG.DURS.FE.DY | 774 | 0 | 23.5 | 28.9 | 0.5 | 15.4 | 690.0 | |
| IC.REG.PROC.FE.NO | 635 | 0 | 7.7 | 3.0 | 1.0 | 7.5 | 20.0 | |
| IC.REG.COST.PC.FE.ZS | 1314 | 0 | 26.1 | 38.6 | 0.0 | 12.9 | 393.0 | |
| IC.REG.MIN.CAP | 931 | 0 | 35.1 | 269.8 | 0.0 | 0.3 | 7445.4 | |
| IC.CNST.PRMT.PROC.NO | 683 | 0 | 15.3 | 4.1 | 7.0 | 15.0 | 44.0 | |
| IC.CNST.PRMT.TM.DY | 918 | 0 | 171.3 | 78.5 | 27.5 | 161.0 | 677.0 | |
| IC.CNST.PRMT.COST.WRH.VAL | 906 | 0 | 6.5 | 9.2 | 0.0 | 3.3 | 79.1 | |
| IC.DCP.BQC.XD.015.DB1619 | 1033 | 0 | 10.3 | 2.6 | 1.0 | 11.0 | 15.0 | |
| IC.CNST.PRMT.QBR.XD.02.DB1619 | 923 | 0 | 1.6 | 0.5 | 0.0 | 1.9 | 2.0 | |
| IC.CNST.PRMT.QCBC.XD.01.DB1619 | 552 | 0 | 0.9 | 0.3 | 0.0 | 1.0 | 1.0 | |
| IC.CNST.PRMT.QCDC.XD.03.DB1619 | 967 | 0 | 1.7 | 0.7 | 0.0 | 2.0 | 3.0 | |
| IC.CNST.PRMT.QCAC.XD.DB1619 | 810 | 0 | 2.7 | 0.6 | 0.0 | 3.0 | 3.0 | |
| IC.CNST.LIR.XD.02.DB1619 | 1001 | 0 | 0.8 | 0.6 | 0.0 | 0.9 | 2.0 | |
| IC.CNST.PC.XD.04.DB1619 | 999 | 0 | 2.6 | 1.3 | 0.0 | 2.8 | 4.0 | |
| IC.ELC.PROC.NO | 638 | 0 | 5.2 | 1.3 | 2.0 | 5.0 | 10.0 | |
| IC.ELC.TIME | 965 | 0 | 94.5 | 56.1 | 7.0 | 82.2 | 482.0 | |
| IC.ELC.ACS.COST | 1967 | 0 | 1381.8 | 2645.1 | 0.0 | 421.7 | 34090.5 | |
| IC.ELC.RSTT.XD.08.DB1619 | 989 | 0 | 4.0 | 3.0 | 0.0 | 5.0 | 8.0 | |
| IC.ELC.OUTG.FREQ.DURS.03.DB1619 | 895 | 0 | 1.2 | 1.2 | 0.0 | 1.0 | 3.0 | |
| IC.ELC.MONT.OUTG.01.DB1619 | 647 | 0 | 0.7 | 0.4 | 0.0 | 1.0 | 1.0 | |
| IC.ELC.RSTOR.01.DB1619 | 677 | 0 | 0.7 | 0.4 | 0.0 | 1.0 | 1.0 | |
| IC.ELC.REGU.MONT.01.DB1619 | 757 | 0 | 0.8 | 0.4 | 0.0 | 1.0 | 1.0 | |
| IC.ELC.LMTG.OUTG.01.DB1619 | 949 | 0 | 0.5 | 0.4 | 0.0 | 0.4 | 1.0 | |
| IC.ELC.COMM.TRFF.CG.01.DB1619 | 796 | 0 | 0.8 | 0.4 | 0.0 | 1.0 | 1.0 | |
| IC.REG.PRRT.COST.PRT.VAL | 777 | 0 | 5.6 | 3.8 | 0.0 | 5.0 | 28.0 | |
| IC.REG.PRRT.DURS.TM | 792 | 0 | 46.7 | 48.9 | 1.0 | 36.5 | 319.0 | |
| IC.REG.PRRT.PROC.NO | 641 | 0 | 6.0 | 2.0 | 1.0 | 6.0 | 14.0 | |
| IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16 | 1243 | 0 | 14.3 | 7.0 | 2.5 | 14.0 | 28.5 | |
| IC.REG.PRRT.RELI.INFR.XD.09.DB1619 | 1184 | 0 | 4.0 | 2.7 | 0.0 | 4.1 | 8.0 | |
| IC.REG.PRRT.TRAP.INFO.XD.06.DB1619 | 1196 | 0 | 2.8 | 1.2 | 0.0 | 3.0 | 6.0 | |
| IC.REG.PRRT.GEO.COVR.XD.08.DB1619 | 1096 | 0 | 2.6 | 2.8 | 0.0 | 1.4 | 8.0 | |
| IC.REG.PRRT.LAND.DISP.XD.08.DB1619 | 1185 | 0 | 5.0 | 1.3 | 0.5 | 5.0 | 8.0 | |
| IC.REG.PRRT.EQACCS.XD.08.DB1619 | 514 | 0 | -0.1 | 0.2 | -1.0 | 0.0 | 0.0 | |
| IC.CRED.ACC.LGL.RGHT.XD.012.DB1519 | 786 | 0 | 5.0 | 2.7 | 0.0 | 5.0 | 12.0 | |
| IC.CRED.ACC.DPTH.CISI.XD.08.DB1519 | 782 | 0 | 5.1 | 2.9 | 0.0 | 6.4 | 8.0 | |
| IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS | 986 | 0 | 12.1 | 20.8 | 0.0 | 2.4 | 100.0 | |
| IC.CRED.ACC.PRVT.CRD.ZS | 1060 | 0 | 33.2 | 35.5 | 0.0 | 19.8 | 100.0 | |
| IC.CRED.ACC.ACES.DB1519 | 794 | 0 | 10.1 | 4.3 | 0.0 | 10.0 | 20.0 | |
| IC.CRED.ACC.ACES.DB0514 | 1779 | 0 | 9.2 | 3.0 | 1.0 | 9.2 | 16.0 | |
| PROT.MINOR.INV.EXT.BUS.DISC.010.XD | 590 | 0 | 5.8 | 2.2 | 0.0 | 6.0 | 10.0 | |
| PROT.MINOR.INV.IC.PRIN.EXT.DIR.LGL.010.XD | 589 | 0 | 4.6 | 2.3 | 0.0 | 4.8 | 10.0 | |
| PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB1519 | 779 | 0 | 6.0 | 1.8 | 0.0 | 6.0 | 10.0 | |
| PROT.MINOR.INV.EASE.SHARE.LGL.XD.010.DB0614 | 1750 | 0 | 5.6 | 1.6 | 0.0 | 6.0 | 10.0 | |
| PROT.MINOR.INV.EXT.SHARE.RTS.XD.010.DB1519 | 776 | 0 | 3.4 | 1.8 | 0.0 | 4.0 | 6.0 | |
| PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519 | 778 | 0 | 3.1 | 2.0 | 0.0 | 3.1 | 7.0 | |
| PROT.MINOR.INV.EXT.CORP.TRANP.XD.0010.DB1519 | 778 | 0 | 3.5 | 2.2 | 0.0 | 4.0 | 7.0 | |
| PROT.MINOR.INV.STRENG.MIN.INV.PROT.XD.010.DB0614 | 815 | 0 | 26.3 | 8.6 | 0.0 | 28.0 | 46.0 | |
| PAY.TAX.PYMT.FREQ.NO | 680 | 0 | 25.5 | 15.9 | 3.0 | 23.8 | 99.0 | |
| PAY.TAX.TM | 953 | 0 | 269.3 | 243.8 | 12.0 | 229.0 | 2600.0 | |
| PAY.TAX.TOT.TAX.RT.ZS | 1055 | 0 | 41.3 | 20.3 | 7.4 | 38.1 | 339.1 | |
| PAY.TAX.PRFT.CP.ZS | 883 | 0 | 16.2 | 7.9 | -0.2 | 17.1 | 58.9 | |
| PAY.TAX.LABR.TAX.CONTR.ZS | 903 | 0 | 17.4 | 10.2 | 0.0 | 16.2 | 54.0 | |
| OTHR.TAX.PAID.ZS | 793 | 0 | 7.1 | 18.7 | 0.0 | 2.3 | 272.3 | |
| PAY.TAX.COIT.AU.HRS.DB1719 | 1269 | 0 | 15.6 | 18.0 | 1.0 | 9.2 | 207.5 | |
| PAY.TAX.COIT.AU.WKS.DB1719 | 1264 | 0 | 11.9 | 17.1 | 0.0 | 4.4 | 113.3 | |
| PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN | 1378 | 0 | 57.3 | 25.1 | 0.0 | 56.1 | 100.0 | |
| TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN | 1085 | 0 | 71.8 | 27.1 | 0.0 | 75.8 | 100.0 | |
| TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN | 1101 | 0 | 72.6 | 25.8 | 0.0 | 77.7 | 100.0 | |
| TRD.ACRS.BRDR.EXPT.TM.BRDR.COMP.HR.DB1619.DFRN | 1134 | 0 | 64.2 | 26.1 | 0.0 | 66.0 | 100.0 | |
| TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN | 1139 | 0 | 71.1 | 24.8 | 0.0 | 74.6 | 100.0 | |
| TRD.ACRS.BRDR.EXPT.COST.DOC.COMP.CD.DB1619 | 1095 | 0 | 128.2 | 133.8 | 0.0 | 100.0 | 1800.0 | |
| TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619 | 1173 | 0 | 473.8 | 341.7 | 0.0 | 450.0 | 3039.0 | |
| ENF.CONT.DURS.DY | 822 | 0 | 645.9 | 286.8 | 164.0 | 575.0 | 1785.0 | |
| ENF.CONT.COEN.FLSR.DY | 666 | 0 | 40.1 | 23.2 | 6.0 | 34.6 | 200.0 | |
| ENF.CONT.COEN.TRJU.DY | 736 | 0 | 412.8 | 225.9 | 90.0 | 365.0 | 1420.0 | |
| ENF.CONT.COEN.ENJU.DY | 680 | 0 | 189.0 | 99.0 | 30.0 | 178.2 | 600.0 | |
| ENF.CONT.COEN.COST.ZS | 788 | 0 | 32.9 | 18.5 | 0.1 | 27.9 | 163.2 | |
| ENF.CONT.COEN.ATFE.PR | 715 | 0 | 20.5 | 15.0 | 0.0 | 17.3 | 155.7 | |
| ENF.CONT.COEN.CTFE.PR | 698 | 0 | 6.3 | 4.3 | 0.1 | 5.3 | 40.2 | |
| ENF.CONT.COEN.ENFE.PR | 696 | 0 | 5.6 | 5.0 | 0.0 | 5.0 | 38.3 | |
| ENF.CONT.COEN.QUJP.XD | 1202 | 0 | 8.3 | 2.9 | 1.5 | 7.6 | 16.5 | |
| ENF.CONT.COEN.CTSP.DB1719 | 1173 | 0 | 3.3 | 0.9 | 0.0 | 3.3 | 5.0 | |
| ENF.CONT.COEN.CSMG | 1178 | 0 | 1.8 | 1.3 | 0.0 | 1.5 | 5.5 | |
| ENF.CONT.COEN.CTAU | 1148 | 0 | 0.9 | 1.0 | 0.0 | 0.5 | 4.0 | |
| ENF.CONT.COEN.ATDR | 1156 | 0 | 2.3 | 0.4 | 0.0 | 2.3 | 3.0 | |
| RESLV.ISV.DB1519.DFRN | 1757 | 0 | 46.3 | 22.1 | 0.0 | 42.9 | 93.9 | |
| RESLV.ISV.RCOV.RT | 1160 | 0 | 37.9 | 24.1 | 0.0 | 32.6 | 93.1 | |
| RESLV.ISV.SOIF.06.DB1519 | 605 | 0 | 8.3 | 3.7 | 0.0 | 8.5 | 15.5 | |
| RESLV.ISV.COPR.03.XD.DB1519 | 562 | 0 | 2.4 | 0.5 | 0.0 | 2.5 | 3.0 | |
| RESLV.ISV.MGDA.XD.DB1519 | 588 | 0 | 4.0 | 1.5 | 0.0 | 4.0 | 6.0 | |
| RESLV.ISV.ROPC.03.XD.DB1519 | 579 | 0 | 0.9 | 1.0 | 0.0 | 0.5 | 3.0 | |
| RESLV.ISV.CPI.04.XD.DB1519 | 582 | 0 | 1.5 | 0.9 | 0.0 | 1.0 | 4.0 | |
| dealing_w_construct | 2035 | 0 | 63.0 | 15.5 | 0.0 | 66.2 | 91.6 | |
| getting_electricity | 2057 | 0 | 68.5 | 18.0 | 0.0 | 71.5 | 100.0 | |
| registering_property | 2064 | 0 | 63.5 | 16.0 | 0.0 | 63.7 | 99.9 | |
| getting_credit | 636 | 0 | 53.8 | 21.5 | 0.0 | 55.0 | 100.0 | |
| protecting_minority | 638 | 0 | 52.5 | 16.8 | 0.0 | 55.1 | 96.7 | |
| paying_taxes | 1712 | 0 | 67.9 | 16.4 | 0.0 | 69.6 | 100.0 | |
| trading_borders | 1498 | 0 | 68.4 | 20.6 | 0.0 | 69.8 | 100.0 | |
| enforcing_contracts | 1235 | 0 | 56.1 | 13.1 | 3.6 | 57.0 | 89.2 | |
| overall_score_db | 2128 | 0 | 61.8 | 13.0 | 20.0 | 61.8 | 89.5 | |
| overall_score | 504 | 0 | 60.3 | 10.2 | 24.7 | 59.4 | 89.7 | |
| property_rights | 549 | 0 | 47.9 | 22.7 | 0.2 | 45.0 | 100.0 | |
| government_integrity | 544 | 0 | 41.8 | 20.2 | 5.0 | 36.3 | 99.5 | |
| judicial_effectiveness | 1529 | 0 | 45.1 | 19.1 | 3.9 | 42.1 | 98.0 | |
| tax_burden | 518 | 0 | 77.8 | 11.7 | 37.2 | 79.1 | 100.0 | |
| government_spending | 716 | 0 | 64.9 | 22.4 | 0.0 | 69.9 | 97.0 | |
| fiscal_health | 1548 | 0 | 64.2 | 25.5 | 0.0 | 69.8 | 100.0 | |
| business_freedom | 616 | 0 | 63.7 | 15.5 | 10.0 | 63.9 | 99.9 | |
| labor_freedom | 576 | 0 | 59.8 | 14.7 | 20.0 | 59.5 | 98.5 | |
| monetary_freedom | 372 | 0 | 74.6 | 9.6 | 0.0 | 75.9 | 91.7 | |
| trade_freedom | 392 | 0 | 74.1 | 10.8 | 0.0 | 75.0 | 95.0 | |
| investment_freedom | 54 | 0 | 54.6 | 22.6 | 0.0 | 60.0 | 95.0 | |
| financial_freedom | 67 | 0 | 48.1 | 18.6 | 0.0 | 50.0 | 90.0 | |
| cpi | 121 | 0 | 42.6 | 18.6 | 8.0 | 38.0 | 92.0 | |
| educ_index | 618 | 0 | 0.6 | 0.2 | 0.2 | 0.7 | 1.0 |
Random Forest Algorithm :
# Set seed for reproducibility
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
set.seed(9000)
## DV editing : the paper divided it into 4 categories
# Create a new variable 'cpi_level' based on fixed ranges (quartiles)
imputed_data <- imputed_data %>%
mutate(cpi_level = cut(cpi,
breaks = c(0, 25, 50, 75, 100),
labels = c(0, 1, 2, 3),
right = TRUE, include.lowest = TRUE))
# Check the distribution of the new variable
table(imputed_data$cpi_level)
##
## 0 1 2 3
## 318 1214 435 161
# Define control for 10-fold cross-validation
control <- trainControl(method = "cv", number = 10) # 10-fold cross-validation
# Train the Random Forest model using 10-fold cross-validation
rf_model <- train(
cpi_level ~ ., # Formula for the model (outcome ~ predictors)
data = imputed_data[,-c(1:3,99,113,114)], # Dataset
method = "rf", # Random Forest method
trControl = control, # Control for cross-validation
importance = TRUE, # To calculate variable importance
ntree = 300 # Number of trees (you can adjust)
)
# Print model summary
print(rf_model)
## Random Forest
##
## 2128 samples
## 109 predictor
## 4 classes: '0', '1', '2', '3'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1915, 1916, 1916, 1915, 1917, 1915, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9313614 0.8843758
## 55 0.9214845 0.8682596
## 109 0.9219672 0.8693415
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
# Get variable importance
# If the importance is split across multiple classes, calculate the mean importance
var_imp <- varImp(rf_model, scale = FALSE)
var_imp_df <- as.data.frame(var_imp$importance)
var_imp_df$Overall <- rowMeans(var_imp_df) # Averaging across all classes
# Calculate total importance (sum of overall importance)
total_importance <- sum(var_imp_df$Overall)
# Calculate heuristic importance for each variable
var_imp_df$heuristic_importance <- var_imp_df$Overall / total_importance
# Sort by heuristic importance and select top 20 variables
top_20_vars <- var_imp_df %>%
arrange(desc(heuristic_importance)) %>%
head(20)
# Add a column for variable names to top_20_vars
top_20_vars$variable <- rownames(top_20_vars)
# Create a bar plot for the top 20 important variables
importance_plot_rf <- plot_ly(
top_20_vars,
x = ~variable,
y = ~heuristic_importance,
type = 'bar',
marker = list(color = 'blue')
) %>%
layout(
title = 'Top 20 Variable Importance (Heuristic Method)',
xaxis = list(title = 'Variables'),
yaxis = list(title = 'Heuristic Importance'),
barmode = 'group'
)
# the plot
importance_plot_rf
Displaying variables’ importance in a more quantitative way to make more readible :
## Displaying top 20 predictors :
print(top_20_vars)
## 0 1 2
## government_integrity 10.275240 9.120248 9.965555
## overall_score 8.752978 9.612558 9.619705
## judicial_effectiveness 7.614339 9.485374 10.005600
## IC.CNST.PRMT.COST.WRH.VAL 7.962846 8.964782 10.238625
## RESLV.ISV.RCOV.RT 8.900820 8.859588 7.561705
## property_rights 7.330619 8.855799 8.595582
## PAY.TAX.TM 6.147319 9.601270 9.587344
## educ_index 6.259592 9.153169 8.683438
## RESLV.ISV.DB1519.DFRN 8.420564 8.633315 7.803198
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 8.557799 9.207875 7.199237
## ENF.CONT.COEN.CTFE.PR 7.435216 9.196429 9.860576
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN 7.315982 8.765450 9.102325
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 8.770743 10.444583 7.240865
## IC.REG.COST.PC.FE.ZS 7.518895 10.037614 7.040299
## financial_freedom 8.073599 8.658639 7.315675
## IC.CNST.PRMT.TM.DY 6.710858 10.549557 8.465423
## tax_burden 7.058841 9.921912 7.726866
## ENF.CONT.COEN.COST.ZS 6.159220 9.384674 8.518290
## IC.ELC.ACS.COST 7.411110 8.902299 7.638461
## PAY.TAX.LABR.TAX.CONTR.ZS 7.279932 8.616527 9.380723
## 3 Overall
## government_integrity 8.667643 9.507172
## overall_score 7.311575 8.824204
## judicial_effectiveness 7.632014 8.684332
## IC.CNST.PRMT.COST.WRH.VAL 6.061821 8.307018
## RESLV.ISV.RCOV.RT 7.614416 8.234132
## property_rights 7.469381 8.062845
## PAY.TAX.TM 6.851884 8.046954
## educ_index 7.842426 7.984656
## RESLV.ISV.DB1519.DFRN 6.789200 7.911569
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 6.634510 7.899855
## ENF.CONT.COEN.CTFE.PR 5.105533 7.899438
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN 6.222263 7.851505
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 4.875701 7.832973
## IC.REG.COST.PC.FE.ZS 6.729101 7.831477
## financial_freedom 7.254660 7.825643
## IC.CNST.PRMT.TM.DY 5.558626 7.821116
## tax_burden 6.395377 7.775749
## ENF.CONT.COEN.COST.ZS 6.383151 7.611334
## IC.ELC.ACS.COST 6.408847 7.590179
## PAY.TAX.LABR.TAX.CONTR.ZS 4.937864 7.553761
## heuristic_importance
## government_integrity 0.01265745
## overall_score 0.01174818
## judicial_effectiveness 0.01156196
## IC.CNST.PRMT.COST.WRH.VAL 0.01105962
## RESLV.ISV.RCOV.RT 0.01096258
## property_rights 0.01073454
## PAY.TAX.TM 0.01071338
## educ_index 0.01063044
## RESLV.ISV.DB1519.DFRN 0.01053313
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 0.01051754
## ENF.CONT.COEN.CTFE.PR 0.01051698
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN 0.01045317
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 0.01042849
## IC.REG.COST.PC.FE.ZS 0.01042650
## financial_freedom 0.01041874
## IC.CNST.PRMT.TM.DY 0.01041271
## tax_burden 0.01035231
## ENF.CONT.COEN.COST.ZS 0.01013341
## IC.ELC.ACS.COST 0.01010525
## PAY.TAX.LABR.TAX.CONTR.ZS 0.01005676
## variable
## government_integrity government_integrity
## overall_score overall_score
## judicial_effectiveness judicial_effectiveness
## IC.CNST.PRMT.COST.WRH.VAL IC.CNST.PRMT.COST.WRH.VAL
## RESLV.ISV.RCOV.RT RESLV.ISV.RCOV.RT
## property_rights property_rights
## PAY.TAX.TM PAY.TAX.TM
## educ_index educ_index
## RESLV.ISV.DB1519.DFRN RESLV.ISV.DB1519.DFRN
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN
## ENF.CONT.COEN.CTFE.PR ENF.CONT.COEN.CTFE.PR
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN
## IC.REG.COST.PC.FE.ZS IC.REG.COST.PC.FE.ZS
## financial_freedom financial_freedom
## IC.CNST.PRMT.TM.DY IC.CNST.PRMT.TM.DY
## tax_burden tax_burden
## ENF.CONT.COEN.COST.ZS ENF.CONT.COEN.COST.ZS
## IC.ELC.ACS.COST IC.ELC.ACS.COST
## PAY.TAX.LABR.TAX.CONTR.ZS PAY.TAX.LABR.TAX.CONTR.ZS
The seconnd approach,namely, neural networks :
# Normalize predictors to ensure better performance for the neural network
nnet.data<-imputed_data[,-c(1:3,99,113,114)]
preproc <- preProcess(nnet.data, method = c("center", "scale"))
nnet.data <- predict(preproc, nnet.data)
# Ensure that the target variable 'cpi_level' is a factor (ordinal)
nnet.data$cpi_level <- factor(nnet.data$cpi_level, ordered = TRUE)
# Set seed for reproducibility
set.seed(9000)
# Control for cross-validation (you should have this defined already)
control <- trainControl(method = "cv", number = 10)
# Train the Neural Network model for ordinal classification using 10-fold cross-validation
library(nnet)
# Train the Neural Network model with a smaller number of hidden neurons
nn_model <- train(
cpi_level ~ ., # Formula for the model (outcome ~ predictors)
data = nnet.data, # Dataset
method = "nnet", # Neural Network method
trControl = control, # Control for cross-validation
tuneGrid = expand.grid(size = c(3, 5, 7), decay = c(0.01, 0.001)), # Adjust size and decay
trace = FALSE, # Suppress trace output
maxit = 200 # Maximum number of iterations
)
# Check the model
print(nn_model)
## Neural Network
##
## 2128 samples
## 109 predictor
## 4 classes: '0', '1', '2', '3'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1915, 1916, 1916, 1915, 1917, 1915, ...
## Resampling results across tuning parameters:
##
## size decay Accuracy Kappa
## 3 0.001 0.8505579 0.7510291
## 3 0.010 0.8792283 0.8006477
## 5 0.001 0.8881748 0.8158830
## 5 0.010 0.8947014 0.8271003
## 7 0.001 0.8957331 0.8287865
## 7 0.010 0.8951821 0.8271879
##
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were size = 7 and decay = 0.001.
library(NeuralNetTools)
var.imp.nn<-garson(nn_model,bar_plot = F)
# Sorting var.imp.nn in decreasing order based on rel_imp
var.imp.nn_sorted <- var.imp.nn[order(-var.imp.nn$rel_imp), , drop = FALSE]
# Add the variable names back as a column (if they were row names)
var.imp.nn_sorted$variable <- rownames(var.imp.nn_sorted)
# Creating the bar plot for variable importance
importance_plot_nn <- plot_ly(var.imp.nn_sorted,
x = ~variable,
y = ~rel_imp,
type = 'bar',
marker = list(color = 'green')) %>%
layout(title = 'Variable Importance from Neural Network',
xaxis = list(title = 'Variables'),
yaxis = list(title = 'Relative Importance'),
barmode = 'group')
# the plot
importance_plot_nn
Displaying the top 20 predictors :
# Print the top 20 predictors
var.imp.nn_sorted %>%
slice_max(order_by = rel_imp, n = 20) %>%
print()
## rel_imp
## government_integrity 0.03311002
## paying_taxes 0.01682678
## RESLV.ISV.RCOV.RT 0.01667900
## ENF.CONT.COEN.CTSP.DB1719 0.01607701
## IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS 0.01546024
## PAY.TAX.PRFT.CP.ZS 0.01519109
## PROT.MINOR.INV.EXT.BUS.DISC.010.XD 0.01433637
## PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519 0.01410425
## IC.CNST.PRMT.PROC.NO 0.01407838
## ENF.CONT.COEN.CTAU 0.01383389
## business_freedom 0.01369460
## RESLV.ISV.COPR.03.XD.DB1519 0.01359411
## TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619 0.01306166
## getting_credit 0.01295903
## PAY.TAX.LABR.TAX.CONTR.ZS 0.01288461
## ENF.CONT.COEN.ATDR 0.01255668
## IC.ELC.COMM.TRFF.CG.01.DB1619 0.01217087
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN 0.01210283
## RESLV.ISV.CPI.04.XD.DB1519 0.01191477
## RESLV.ISV.ROPC.03.XD.DB1519 0.01190975
## variable
## government_integrity government_integrity
## paying_taxes paying_taxes
## RESLV.ISV.RCOV.RT RESLV.ISV.RCOV.RT
## ENF.CONT.COEN.CTSP.DB1719 ENF.CONT.COEN.CTSP.DB1719
## IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS IC.CRED.ACC.PUBL.CRD.REG.COVR.ZS
## PAY.TAX.PRFT.CP.ZS PAY.TAX.PRFT.CP.ZS
## PROT.MINOR.INV.EXT.BUS.DISC.010.XD PROT.MINOR.INV.EXT.BUS.DISC.010.XD
## PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519 PROT.MINOR.INV.EXT.OWNR.CONT.XD.0100.DB1519
## IC.CNST.PRMT.PROC.NO IC.CNST.PRMT.PROC.NO
## ENF.CONT.COEN.CTAU ENF.CONT.COEN.CTAU
## business_freedom business_freedom
## RESLV.ISV.COPR.03.XD.DB1519 RESLV.ISV.COPR.03.XD.DB1519
## TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619 TRD.ACRS.BRDR.IMP.COST.BRDR.COMP.CD.DB1619
## getting_credit getting_credit
## PAY.TAX.LABR.TAX.CONTR.ZS PAY.TAX.LABR.TAX.CONTR.ZS
## ENF.CONT.COEN.ATDR ENF.CONT.COEN.ATDR
## IC.ELC.COMM.TRFF.CG.01.DB1619 IC.ELC.COMM.TRFF.CG.01.DB1619
## PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN PAY.TAX.POST.FIL.XD.0100.DB1719.DFRN
## RESLV.ISV.CPI.04.XD.DB1519 RESLV.ISV.CPI.04.XD.DB1519
## RESLV.ISV.ROPC.03.XD.DB1519 RESLV.ISV.ROPC.03.XD.DB1519
the last machine learning algorithm, support vector machine:
library(e1071)
# Set seed for reproducibility
set.seed(9000)
# Define the control function for k-fold cross-validation (k = 10)
train_control <- trainControl(method = "cv", number = 10)
# Train the SVM model
svm_model <- train(
cpi_level ~ ., # Formula for the model (outcome ~ predictors)
data = nnet.data,
method = "svmRadial", # Radial basis function kernel for non-linear SVM
trControl = train_control, # Control for cross-validation
preProcess = c("center", "scale"), # Preprocess by centering and scaling
tuneLength = 10 # Grid search over 10 values of the tuning parameter
)
# Print the summary of the model
print(svm_model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 2128 samples
## 109 predictor
## 4 classes: '0', '1', '2', '3'
##
## Pre-processing: centered (109), scaled (109)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1915, 1916, 1916, 1915, 1917, 1915, ...
## Resampling results across tuning parameters:
##
## C Accuracy Kappa
## 0.25 0.8932974 0.8166285
## 0.50 0.9031588 0.8345985
## 1.00 0.9139726 0.8537235
## 2.00 0.9261904 0.8746506
## 4.00 0.9276078 0.8777353
## 8.00 0.9271471 0.8775576
## 16.00 0.9332748 0.8886084
## 32.00 0.9346810 0.8914972
## 64.00 0.9351461 0.8924387
## 128.00 0.9323314 0.8882008
##
## Tuning parameter 'sigma' was held constant at a value of 0.007229501
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.007229501 and C = 64.
library(vip)
##
## Attaching package: 'vip'
## The following object is masked from 'package:utils':
##
## vi
# Prediction wrapper function for classification outcome
predict_classification <- function(object, newdata) {
as.factor(predict(object, newdata = newdata)) # Ensure predictions are factors for classification
}
# Get variable importance from the SVM model using caret's varImp function
# Get variable importance from the SVM model using caret's varImp function
svm_var_imp <- varImp(svm_model, scale = FALSE)
# Convert the variable importance object to a data frame
svm_var_imp_df <- as.data.frame(svm_var_imp$importance)
# Add the variable names to the data frame
svm_var_imp_df$variable <- rownames(svm_var_imp_df)
# If it's a multiclass model, calculate the mean importance across all classes
if (ncol(svm_var_imp_df) > 1) {
svm_var_imp_df$Overall <- rowMeans(svm_var_imp_df[, -ncol(svm_var_imp_df)])
} else {
svm_var_imp_df$Overall <- svm_var_imp_df[, 1] # Single-class case
}
# Select the top 20 important variables
svm_var_imp_top20 <- svm_var_imp_df %>%
top_n(20, Overall) %>%
arrange(desc(Overall))
# Plot the top 20 important variables using plotly
library(plotly)
importance_plot_svm <- plot_ly(svm_var_imp_top20,
x = ~reorder(variable, Overall),
y = ~Overall,
type = 'bar',
marker = list(color = 'green')) %>%
layout(title = 'Top 20 Important Variables from SVM Model',
xaxis = list(title = 'Variables'),
yaxis = list(title = 'Mean Importance'),
barmode = 'group')
# Show the plot
importance_plot_svm
print(svm_var_imp_top20)
## X0 X1 X2
## government_integrity 0.9999169 1.0000000 0.9775408
## judicial_effectiveness 0.9949035 1.0000000 0.9391250
## property_rights 0.9900022 1.0000000 0.9282840
## overall_score 0.9930456 1.0000000 0.9109129
## overall_score_db 0.9878624 0.9994336 0.8847513
## paying_taxes 0.9776477 0.9965819 0.8581255
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 0.9733969 0.9998047 0.8487976
## trading_borders 0.9677583 0.9988867 0.8509851
## business_freedom 0.9712174 0.9957518 0.8454383
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 0.9744524 0.9995508 0.8337755
## IC.ELC.ACS.COST 0.9660160 0.9961034 0.8356530
## IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16 0.9568857 0.9990625 0.8126711
## educ_index 0.9370021 0.9999805 0.8270191
## ENF.CONT.COEN.QUJP.XD 0.9608979 0.9905856 0.8168291
## RESLV.ISV.RCOV.RT 0.9254066 0.9998437 0.8335276
## TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN 0.9609232 0.9883492 0.8166269
## RESLV.ISV.DB1519.DFRN 0.9172558 0.9969921 0.8427828
## IC.REG.PRRT.GEO.COVR.XD.08.DB1619 0.9202523 0.9963475 0.8260552
## investment_freedom 0.9392214 0.9966503 0.8028669
## IC.REG.COST.PC.FE.ZS 0.9489084 0.9904293 0.8033231
## X3
## government_integrity 1.0000000
## judicial_effectiveness 1.0000000
## property_rights 1.0000000
## overall_score 1.0000000
## overall_score_db 0.9994336
## paying_taxes 0.9965819
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 0.9998047
## trading_borders 0.9988867
## business_freedom 0.9957518
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 0.9995508
## IC.ELC.ACS.COST 0.9961034
## IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16 0.9990625
## educ_index 0.9999805
## ENF.CONT.COEN.QUJP.XD 0.9905856
## RESLV.ISV.RCOV.RT 0.9998437
## TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN 0.9883492
## RESLV.ISV.DB1519.DFRN 0.9969921
## IC.REG.PRRT.GEO.COVR.XD.08.DB1619 0.9963475
## investment_freedom 0.9966503
## IC.REG.COST.PC.FE.ZS 0.9904293
## variable
## government_integrity government_integrity
## judicial_effectiveness judicial_effectiveness
## property_rights property_rights
## overall_score overall_score
## overall_score_db overall_score_db
## paying_taxes paying_taxes
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN
## trading_borders trading_borders
## business_freedom business_freedom
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN
## IC.ELC.ACS.COST IC.ELC.ACS.COST
## IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16 IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16
## educ_index educ_index
## ENF.CONT.COEN.QUJP.XD ENF.CONT.COEN.QUJP.XD
## RESLV.ISV.RCOV.RT RESLV.ISV.RCOV.RT
## TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN
## RESLV.ISV.DB1519.DFRN RESLV.ISV.DB1519.DFRN
## IC.REG.PRRT.GEO.COVR.XD.08.DB1619 IC.REG.PRRT.GEO.COVR.XD.08.DB1619
## investment_freedom investment_freedom
## IC.REG.COST.PC.FE.ZS IC.REG.COST.PC.FE.ZS
## Overall
## government_integrity 0.9943644
## judicial_effectiveness 0.9835071
## property_rights 0.9795715
## overall_score 0.9759896
## overall_score_db 0.9678702
## paying_taxes 0.9572342
## TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN 0.9554510
## trading_borders 0.9541292
## business_freedom 0.9520398
## TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN 0.9518324
## IC.ELC.ACS.COST 0.9484689
## IC.REG.PRRT.QUAL.LNDADM.XD.030.DB16 0.9419204
## educ_index 0.9409955
## ENF.CONT.COEN.QUJP.XD 0.9397245
## RESLV.ISV.RCOV.RT 0.9396554
## TRD.ACRS.BRDR.IMP.TM.BRDR.COMP.HR.DB1619.DFRN 0.9385621
## RESLV.ISV.DB1519.DFRN 0.9385057
## IC.REG.PRRT.GEO.COVR.XD.08.DB1619 0.9347506
## investment_freedom 0.9338472
## IC.REG.COST.PC.FE.ZS 0.9332725
overall_score refers to the Economic Freedom Index; overall_score_db refers to Ease of Doing Business Score; “TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN” refers to Score-Time to export: Documentary compliance (hours) (DB16-20 methodology); “TRD.ACRS.BRDR.EXPT.TM.DOC.COMP.HR.DB1619.DFRN” : Trading across borders: Time to import: Documentary compliance (hours) (DB16-19 methodology);“TRD.ACRS.BRDR.IMP.TM.DOC.COMP.HR.DB1619.DFRN” refers to the number of hours required for document compliance during the import process across borders. The rest can be found by checking World Bank Glossary.
Confusion Matrix for the Random Forest model:
rf_predictions <- predict(rf_model, newdata = nnet.data)
# Generate the confusion matrix using the actual and predicted values
conf_matrix <- confusionMatrix(rf_predictions, nnet.data$cpi_level)
# Print the confusion matrix
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3
## 0 318 1214 435 161
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 0
##
## Overall Statistics
##
## Accuracy : 0.1494
## 95% CI : (0.1345, 0.1653)
## No Information Rate : 0.5705
## P-Value [Acc > NIR] : 1
##
## Kappa : 0
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3
## Sensitivity 1.0000 0.0000 0.0000 0.00000
## Specificity 0.0000 1.0000 1.0000 1.00000
## Pos Pred Value 0.1494 NaN NaN NaN
## Neg Pred Value NaN 0.4295 0.7956 0.92434
## Prevalence 0.1494 0.5705 0.2044 0.07566
## Detection Rate 0.1494 0.0000 0.0000 0.00000
## Detection Prevalence 1.0000 0.0000 0.0000 0.00000
## Balanced Accuracy 0.5000 0.5000 0.5000 0.50000
Confusion Matrix for the Neural Net model:
nnet_predictions <- predict(nn_model, newdata = nnet.data)
# Generate the confusion matrix using the actual and predicted values
conf_matrix <- confusionMatrix(nnet_predictions, nnet.data$cpi_level)
# Print the confusion matrix
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3
## 0 307 2 0 0
## 1 11 1211 4 1
## 2 0 1 429 2
## 3 0 0 2 158
##
## Overall Statistics
##
## Accuracy : 0.9892
## 95% CI : (0.9838, 0.9931)
## No Information Rate : 0.5705
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9821
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3
## Sensitivity 0.9654 0.9975 0.9862 0.98137
## Specificity 0.9989 0.9825 0.9982 0.99898
## Pos Pred Value 0.9935 0.9870 0.9931 0.98750
## Neg Pred Value 0.9940 0.9967 0.9965 0.99848
## Prevalence 0.1494 0.5705 0.2044 0.07566
## Detection Rate 0.1443 0.5691 0.2016 0.07425
## Detection Prevalence 0.1452 0.5766 0.2030 0.07519
## Balanced Accuracy 0.9822 0.9900 0.9922 0.99017
Confusion Matrix for the SVM model :
# Load necessary libraries
# Predict the class labels on the training data (or you can use new test data if available)
svm_predictions <- predict(svm_model, newdata = nnet.data)
# Generate the confusion matrix using the actual and predicted values
conf_matrix <- confusionMatrix(svm_predictions, nnet.data$cpi_level)
# Print the confusion matrix
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3
## 0 310 2 0 0
## 1 8 1208 5 0
## 2 0 4 427 7
## 3 0 0 3 154
##
## Overall Statistics
##
## Accuracy : 0.9864
## 95% CI : (0.9805, 0.9909)
## No Information Rate : 0.5705
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9774
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3
## Sensitivity 0.9748 0.9951 0.9816 0.95652
## Specificity 0.9989 0.9858 0.9935 0.99847
## Pos Pred Value 0.9936 0.9894 0.9749 0.98089
## Neg Pred Value 0.9956 0.9934 0.9953 0.99645
## Prevalence 0.1494 0.5705 0.2044 0.07566
## Detection Rate 0.1457 0.5677 0.2007 0.07237
## Detection Prevalence 0.1466 0.5738 0.2058 0.07378
## Balanced Accuracy 0.9869 0.9904 0.9876 0.97750